Describing Images using Inferred Visual Dependency Representations

نویسندگان

  • Desmond Elliott
  • Arjen P. de Vries
چکیده

The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR. The performance of our approach is comparable to a state-ofthe-art multimodal deep neural network in images depicting actions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Description using Visual Dependency Representations

Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the o...

متن کامل

Query-by-Example Image Retrieval using Visual Dependency Representations

Image retrieval models typically represent images as bags-of-terms, a representation that is wellsuited to matching images based on the presence or absence of terms. For some information needs, such as searching for images of people performing actions, it may be useful to retain data about how parts of an image relate to each other. If the underlying representation of an image can distinguish b...

متن کامل

A Treebank of Visual and Linguistic Data

The treebank is a new resource for researchers working at the intersection between vision and language. It will be a freely-available corpus of images and corresponding text for the development and evaluation of models for natural language generation, image annotation, and structure induction. The treebank differs from existing datasets because it contains syntactic representations of the data,...

متن کامل

Face Shape Recovery from a Single Image View

The problem of acquiring surface models of faces is an important one with potentially significant applications in biometrics, computer games and production graphics. For such task, the use of shape-from-shading (SFS) is appealing since it is a non-invasive method that mimics the capabilities of the human visual system. In this thesis, our interest lies on the recovery of facial shape from singl...

متن کامل

The Rhetorical - Aesthetic Approach to Constructing the Relation between Images and Visual Inventions with Global Politics

Images and photos play an important role in our understanding of domestic and international events. Today we are living in the age of the visualization of politics. The images are vague, rhetorical, and aesthetic components of political and social phenomena and can give them a beautiful or detestable structure. In the digital age, images in and of themselves can define our structure and vision ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015